Parallel Programming with Microsoft Visual Studio 2010 : Introduction to Parallel Tasks

8/11/2011 11:39:10 AM

There are several ways to invoke parallel tasks. This section reviews the various available techniques, starting with the Parallel.Invoke method.

Let us assume that you have three methods (MethodA, MethodB, and MethodC) that have separate data input. If the methods execute sequentially, the execution time would be the sum of the duration of each method, as shown in the following image. In this example, the total time to execute the methods sequentially is 20 milliseconds (ms).

MethodA, MethodB, and MethodC have no dependencies. When the methods are converted to tasks, the tasks are completely independent, and they are said to be embarrassingly parallel. This is important because it means that the tasks can run in parallel without any synchronization, such as a semaphore or monitor. Performance is therefore better, and this portion of the application is more scalable. Furthermore, synchronization can add complexity, which is harder to maintain and debug. When a group of tasks execute in parallel, their duration is the elapsed time of the longest task. In the following image, TaskB takes the longest time (10 ms) to complete. Therefore, the duration of the group of tasks is 10 ms, which is half the duration of the related methods when they are executed sequentially (see the previous illustration). This assumes that there are enough processors (three, in this example) to execute the tasks in parallel and not sequentially.

Tasks are scheduled and then assigned to a thread. The thread is then executed by a processor. In the past, threads—not tasks—were the basic unit of scheduling. A thread is an asynchronous path of execution through your process. A process is itself not active; activity is found in the threads. In fact, the Windows operating system automatically closes a process with no threads, because there is no reason to maintain a process that has no potential activity. Despite this, a process is nonetheless important. It provides the resources needed for a running program, such as the virtual memory, the working set, and the handle table. Threads require these resources to execute. Threads privately own resources as well, most notably the stack and thread local storage.

1. Threads

Tasks run on threads. Threads are the engine of your application and represent running code. Understanding threads helps you understand tasks. This is important because some of the overhead associated with a task pertains to threads. Therefore, threads are especially relevant when discussing parallel programming.

The Windows operating system schedules threads preemptively. Threads are assigned an adjustable priority, and in general, they are scheduled in a round-robin fashion based on thread priority. Here are some of the reasons that a thread might be preempted:

The thread has exceeded its time slice or quantum.
A higher-priority thread starts.
The thread places itself into a wait condition.
An input-output operation occurs.

Nothing is free—including threads. Threads have overhead. Most of the overhead associated with a thread involves the stack, thread local storage, and context switches. The default stack size for each thread is one megabyte (MB). For example, 200 threads would reserve 200 MB of memory for stack space, which is not trivial. Thread local storage is private memory set aside for each thread and can also be significant. In addition to the storage overhead, threads have a performance cost: context switches. Much of this cost is associated with switching between user and kernel mode when swapping the running thread for another user mode thread. The cost of context switches can reduce the benefit of additional threads.

In addition to context switches, there are other costs, such as ramping up and destroying threads. The Microsoft .NET Framework thread pool helps to manage these costs and to abstract much of the complexity of creating, starting, and destroying threads. There is one .NET Framework thread pool per managed application. Thread pools commonly reuse threads to avoid the costs of thread startup and destruction. When the thread is no longer required, the operating system reassigns additional work to the thread or suspends the thread. There is an algorithm for adjusting thread pool size dynamically based on thread utilization and other factors. The .NET Framework 4 thread pool is the default scheduler for parallel tasks in the .NET Framework. When you start a task, it is scheduled to run and placed in a queue as part of the thread pool. Later, the task is dequeued and assigned to run on an available thread. Fortunately, most of this activity is transparent to you.

To recap: A task is a group of related statements. When started, tasks are added to a queue in the thread pool. Eventually (maybe immediately), a task is executed on a thread, which is a unit of execution. This is, of course, a thread from the thread pool. Each thread is assigned to and executed initially on a particular processor, which is considered the processor unit. The following diagram shows the relationship between a task, a thread, and a processor.

2. The Task Class

In the .NET Framework, the Task class is a logical abstraction of a task. You can use this class to schedule and ultimately execute a parallel task. Remember, tasks are unlike threads in that you do not start a task directly. By default, the thread pool schedules a task, places it on a queue in the thread pool, and eventually executes the task on an available thread.

2.1. Using the Parallel.Invoke Method

You can schedule a task in several ways, the simplest of which is by using the Parallel.Invoke method. The following example executes two parallel tasks—one for MethodA and another for MethodB. This version of Parallel.Invoke accepts an array of Action delegates as the sole parameter. Action delegates have no arguments and return void.

Parallel.Invoke(new Action[] { MethodA, MethodB });

The Parallel.Invoke method is convenient for executing multiple tasks in parallel. However, this method has limitations:

Parallel.Invoke creates but does not return task objects.
The Action delegate is limited—it has no parameters and no return value.
Parallel.Invoke is not as flexible as other solutions and always uses an implied Task.WaitAll method, described in more detail later in this section.

Parallel.Invoke does not guarantee the ordering of task execution. In the previous example, the task for MethodB might execute first. When ordering is required, dependency is implied. You should avoid task dependencies whenever possible.

The Parallel.Invoke method does not return until the provided tasks have completed. For example, consider the use of Parallel.Invoke to execute two tasks. The first task’s duration is 500 ms, and the duration of the second task is 250 ms. Parallel.Invoke will return in 500 ms. In other words, the duration of Parallel.Invoke is the execution time of the longest-running task.

2.2. Using the TaskFactory.StartNew method

Another way to execute a task is to use the TaskFactory.StartNew method. The TaskFactory class is an abstraction of a task scheduler. In the Task class, the Factory property returns the default implementation of the task factory, which employs the .NET Framework 4 thread pool. TaskFactory has a StartNew method for executing tasks using the default task factory. Here is sample code that executes two tasks by using TaskFactory.StartNew.

var TaskA = Task.Factory.StartNew(MethodA);
var TaskB = Task.Factory.StartNew(MethodB);

Like the Parallel.Invoke method, TaskFactory.StartNew uses an Action delegate. Unlike the Parallel.Invoke method, TaskFactory.StartNew returns immediately. There is no implied wait. Therefore, how do you know when the task or tasks have completed? Fortunately, the StartNew returns a Task object. You can explicitly wait for a single task by using the Task.Wait method, which is an instance method. A thread waiting on a task is considered to be the joining thread. If the task has already completed, Task.Wait returns immediately. Otherwise, the joining thread will wait (block) until the task has completed. Here’s sample code that executes and waits on a single task.

var TaskA = Task.Factory.StartNew(MethodA);
TaskA.Wait();

The previous example waited on a single task. You can also wait for multiple tasks. After all, an application is not very parallel with a single task. There are two options for waiting for multiple tasks: wait for all the tasks to complete, or wait for any task to complete. Task.WaitAll waits for all tasks to complete. Task.WaitAny returns when any of the referenced tasks have completed. Both methods accept an array of tasks as a parameter. Both Task.WaitAll and Task.WaitAny methods are static methods of the Task class.

Create two tasks from separate methods and wait for both tasks to complete

Create a console application. Before the Main function, add a MethodA and MethodB method. Each method should display the name of the method by using Console.WriteLine.
```
static void MethodA() { Console.WriteLine("MethodA"); }
static void MethodB() { Console.WriteLine("MethodB"); }
```
In the Main function, create and start two tasks with the TaskFactory.StartNew method. Initialize the tasks with the MethodA and MethodB methods.
```
var TaskA = Task.Factory.StartNew(MethodA);
var TaskB = Task.Factory.StartNew(MethodB);
```
You can now wait for both methods to complete with the Task.WaitAll method.
```
Task.WaitAll(new Task[] { TaskA, TaskB });
```

Add Console.ReadLine() call to the end of the Main method to prevent the console from closing before you can see the output. Your completed code should look like the following.

class Program
{
    static void MethodA() { Console.WriteLine("MethodA"); }
    static void MethodB() { Console.WriteLine("MethodB"); }


    static void Main(string[] args)
    {
        var TaskA = Task.Factory.StartNew(MethodA);
        var TaskB = Task.Factory.StartNew(MethodB);
        Task.WaitAll(new Task[] { TaskA, TaskB });
        Console.ReadLine();
    }
}

Run the program. You should see the output from both tasks in the console window.

The example you just completed uses the Task.WaitAll method to wait for both tasks to complete. In contrast, the Task.WaitAny method returns as soon as any task completes. You can discover which task completed because Task.WaitAny returns a zero-based index into the Task array that was provided as a parameter. The array index identifies the task that completed.

You used the Console.ReadLine method to prevent the console window from closing. Of course, this makes it easier to view the results. Alternatively, you can execute the application in Release mode, which you can do by using the shortcut Ctrl+F5. This prevents the addition of an extraneous line of code. Either technique is okay.

In the previous example, you waited for all tasks to complete. In this tutorial, you just wait for the first task to complete. You will then display the task ID of that task.

Create two tasks and wait for just the first one to complete

Create a new console application. At the top of the source file, reference the System.Threading and System.Threading.Tasks namespaces. Before the Main function, add a MethodA and MethodB method. To emulate a compute bound task, you will ask each task to simply spin and burn processor cycles by using the static Thread.SpinWait method.
```
static void MethodA() { Thread.SpinWait(int.MaxValue); }
static void MethodB() { Thread.SpinWait(int.MaxValue/2); }
```
In the Main function, create two tasks with the TaskFactory.StartNew method. Initialize the tasks with the MethodA and MethodB methods.
```
var TaskA = Task.Factory.StartNew(MethodA);
var TaskB = Task.Factory.StartNew(MethodB);
```

Display the ID for each task.

Console.WriteLine("TaskA id = {0}", TaskA.Id);
Console.WriteLine("TaskB id = {0}", TaskB.Id);

Now create an array of tasks. Call Task.WaitAny to wait until either task has run to completion. Initialize an integer variable with the return value.
```
Task [] tasks=new Task[] {TaskA, TaskB};
int whichTask = Task.WaitAny(tasks);
```
WaitAny returns a zero-based index into the task array, which identifies the task that has finished. In this example, you can use the index to display the task ID of that task.
```
Console.WriteLine("Task {0} is the gold medal task.",
   tasks[whichTask].Id);
```

Keep the console open and prevent the application from exiting by using a Console.ReadLine= statement at the end of the application.

Here is the completed code:

using System;

using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace TwoTask_WaitAny
{
    class Program
    {

        static void MethodA() { Thread.SpinWait(int.MaxValue); }
        static void MethodB() { Thread.SpinWait(int.MaxValue/2); }

        static void Main(string[] args)
        {
            var TaskA = Task.Factory.StartNew(MethodA);
            var TaskB = Task.Factory.StartNew(MethodB);

            Console.WriteLine("TaskA id = {0}", TaskA.Id);
            Console.WriteLine("TaskB id = {0}", TaskB.Id);

            var tasks=new Task[] {TaskA, TaskB};
            int whichTask = Task.WaitAny(tasks);
            Console.WriteLine("Task {0} is the gold medal task.",
                tasks[whichTask].Id);

            Console.WriteLine("Press enter to exit");
            Console.ReadLine();

        }
    }
}

Run the application. This time, when either of the methods completes, the Console.WriteLine code will show a message containing the ID of the task that finished first. Based on your code, TaskB will finish first.

Be careful with the Wait methods: Wait, WaitAll, and WaitAny. There is always a possibility of waiting longer than you expect; you might even deadlock. For that reason, you should always consider using the versions of these methods that have a time-out parameter. You can then set a reasonable duration for the task to complete. When the task exceeds the duration, the Wait method will time out and return false. It is important to note that the task itself is not aborted and might run until completion. However, you are no longer waiting. In this circumstance, you might decide to cancel the task.

Because the Wait method has a time-out, using the following code is safer than using the code in the previous examples. The task must complete in the allotted time, or the wait will be released. The joining thread is then allowed to continue.

var TaskA = Task.Factory.StartNew(MethodA);
if (!TaskA.Wait(5000)) {
    Console.WriteLine("Task timed out");
}

2.3. Using the Task.Start Method

There is yet one more way to directly start a task. You can create an instance of the Task class and call the Task.Start method, which is an instance method. One advantage is that you can configure the task in the Task constructor before starting the task. Several of the options, as defined in the TaskCreationOption enumeration, are particularly helpful. The following table lists the options.

Option	Explanation
PreferFairness	This is a suggestion to the TaskScheduler that tasks should execute in an order similar to when they were scheduled.
LongRunning	In the .NET Framework 4, long-running tasks are scheduled on threads not in the thread pool.
AttachToParent	This initiates a task inside another task to create a subtask.

This example code demonstrates a couple of these options.

var TaskA = new Task(MethodA, TaskCreationOptions.LongRunning);
var TaskB = new Task(MethodB, TaskCreationOptions.PreferFairness);

TaskA.Start();
TaskB.Start();

The .NET Framework 4 thread pool is optimized for short tasks. For this reason, long-running tasks can adversely affect the performance of the .NET Framework 4 thread pool. Use the TaskCreationOption.LongRunning option to schedule a long-running task on a dedicated thread that is not part of the thread pool.

The .NET Framework thread pool does not promise to execute tasks in order; more often than not, tasks will run out of sequence. Therefore, you cannot predict the order of execution and should not write code predicated on this requirement. TaskCreationOptions.PreferFairness is a task option that indicates a preference to execute a task in its natural order. However, TaskCreationOptions.PreferFairness is merely a suggestion to execute a task in order, not an absolute directive to the Task Parallel Library (TPL).

2.1.3. Using Function Delegates

Action delegates have no parameters and do not return a value. In some circumstances, this might be inflexible and limiting. For tasks, you can also use the Func<TResult> delegate, where TResult is the return type. When the task completes, the return or result of the task is accessible with the Task<TResult>.Result property. You can check for the result at any time. However, if the task has not completed, accessing the Task<TResult>.Result property will block the joining thread until the task is done. Execution will resume after the task has finished, when the result is returned.

In the next example, a task returns the value 42, which is the answer to life, the universe, and everything . Task<int> indicates that the task returns an integer value.

Lambda expressions are convenient for relatively short tasks.

Create a task that uses a lambda expression

Create a new console application. In the Main function, create and start a new task by using the TaskFactory.StartNew method. The task will return an integer value. In the TaskFactory.StartNew method, initialize the task with a lambda expression. In the lambda expression, return the value of 42.
```
var TaskA = Task<int>.Factory.StartNew(() => {
   return 42; });
```
You can now wait for the task with the TaskA.Wait method.
```
TaskA.Wait();
```
Display the result of the task with the Task.Result property.
Build and run the application.

Here is the complete application.

class Program
{
    static void Main(string[] args)
    {
        var TaskA = Task<int>.Factory.StartNew(() =>
        {
            return 42;
        });
        TaskA.Wait();
        Console.WriteLine(TaskA.Result);
    }
}

Here is another example of starting a task that returns a value. Different from the previous example, this code initializes the Task object in its constructor with a function delegate. Task.Start then executes the task.

Task <int> TaskA=new Task<int>(() => {
    return 42;
});
TaskA.Start();

You now know how to execute a task that has a return value. You might also want to pass information into the task. You pass state information for a task as an additional parameter—the object type (the state information can be any type). The next example employs TaskFactory.StartNew to execute a task. The first parameter is the task, and the second parameter is the state.

Create a task that uses a state object, which in this example is a string

Create a console application. In the Main function, create and start a new task by using the TaskFactory.StartNew method. Initialize the task with a lambda expression. Pass the state object into the lambda expression as a parameter. In the lambda expression, return the length of the lambda expression.
```
var TaskA = Task<int>.Factory.StartNew(val => {
return ((string)val).Length;},
```
The next parameter of this version of the TaskFactory.StartNew method is the state object. You will provide a string value.
```
"On Thursday, the cow jumped over the moon.");
```
You are ready to wait for the task, which is done with the Task.Wait method. You can then display the result of the task by using the Task.Result property. Of course, the answer remains 42, since that is truly the answer to life, the universe, and everything.
Build and run the application.

Here is the complete application.

class Program
{
    static void Main(string[] args)
    {
        var TaskA = Task<int>.Factory.StartNew(val => {
            return ((string)val).Length;},
            "On Thursday, the cow jumped over the moon.");
        TaskA.Wait();
        Console.WriteLine(TaskA.Result);
    }
}

Here’s an altered version of the previous example. Instead of using TaskFactory.StartNew, the task constructor initializes the state object.

var TaskA=new Task<int>(val =>{
    return ((string)val).Length;
}, "On Thursday, the cow jumped over the moon.");

TaskA.Start();

Other -----------------

- jQuery 1.3 : DOM Manipulation - Moving elements

- .NET Debugging : Introduction to the Tools - .NET 2.0—Redistributable & .NET 2.0—SDK

- .NET Debugging : Managed Heap and Garbage Collection

- Context and Interception : Custom Component Services (part 3) - The Transaction Management Service

- Context and Interception : Custom Component Services (part 2) - The Logbook Service

- Context and Interception : Custom Component Services (part 1) - Building a Custom Context Attribute & Installing a Custom Message Sink

- Software Testing with Visual Studio Team System 2008 : Data-driven unit testing

- Software Testing with Visual Studio Team System 2008 : Unit testing an ASP.NET application

- Microsoft Enterprise Library : Error Management Made Exceptionally Easy - Replacing an Exception & Logging an Exception

- Microsoft Enterprise Library : Error Management Made Exceptionally Easy - Diving in with a Simple Example

- iPhone Programming : Connecting to the Network - Embedding a Web Browser in Your App

- iPhone Programming : Connecting to the Network - Detecting Network Status

- Parallel Programming with Microsoft Visual Studio 2010 : Introduction to Parallel Programming - Software Patterns

- Parallel Programming with Microsoft Visual Studio 2010 : Introduction to Parallel Programming - Multicore Computing & Speedup

- Microsoft ASP.NET 3.5 : Web Services for ASP.NET AJAX Applications (part 2) - Consuming AJAX Web Services

- Microsoft ASP.NET 3.5 : Web Services for ASP.NET AJAX Applications (part 1) - Remote Calls via Web Services

- Microsoft ASP.NET 3.5 : AJAX-Enabled Web Services - Implementing the AJAX Paradigm

- The Art of SEO : Measuring Search Traffic (part 2)

- The Art of SEO : Measuring Search Traffic (part 1)

- Programming Excel with VBA and .NET : Tasks in Visual Basic - Do Math